Reinforcement Learning with Function Approximation Converges to a Region
نویسنده
چکیده
Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program.
منابع مشابه
Reinforcement Learning with Linear Function Approximation and LQ control Converges
Reinforcement learning is commonly used with function approximation. However, very few positive results are known about the convergence of function approximation based RL control algorithms. In this paper we show that TD(0) and Sarsa(0) with linear function approximation is convergent for a simple class of problems, where the system is linear and the costs are quadratic (the LQ control problem)...
متن کاملBarycentric Interpolators for Continuous Space and Time Reinforcement Learning
In order to find the optimal control of continuous state-space and time reinforcement learning (RL) problems, we approximate the value function (VF) with a particular class of functions called the barycentric interpolators. We establish sufficient conditions under which a RL algorithm converges to the optimal VF, even when we use approximate models of the state dynamics and the reinforcement fu...
متن کاملTD(0) Converges Provably Faster than the Residual Gradient Algorithm
In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the concept of asymptotic convergence rate to prove that under certain conditions the synchronous off-policy TD(0) algorithm converges faster than the synchronous offpolicy residual gradient algorithm if the value function...
متن کاملKernel-Based Models for Reinforcement Learning
Model-based approaches to reinforcement learning exhibit low sample complexity while learning nearly optimal policies, but they are generally restricted to finite domains. Meanwhile, function approximation addresses continuous state spaces but typically weakens convergence guarantees. In this work, we develop a new algorithm that combines the strengths of Kernel-Based Reinforcement Learning, wh...
متن کامل